Leveraging resource management for efficient performance of Apache Spark

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Comparison of Apache Spark and Tez for Entity Resolution

Entity Resolution is among the hottest topics in the field of Big data. It finds duplicates in datasets, which actually belong to same entity in the real world. Algorithms that perform Entity Resolution are computation intensive and consume a lot of time especially for large datasets. A lot of research has been conducted for improving Entity Resolution solutions. A number of algorithms are deve...

متن کامل

Efficient iterative virtual screening with Apache Spark and conformal prediction

BACKGROUND Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. CONTRIBUTION In this study we propose a strategy that is b...

متن کامل

Approximate Stream Analytics in Apache Flink and Apache Spark Streaming

Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...

متن کامل

MLlib: Machine Learning in Apache Spark

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Big Data

سال: 2019

ISSN: 2196-1115

DOI: 10.1186/s40537-019-0240-1